Modeling person-specific development of math skills in 
continuous time: New evidence for mutualism 


Lu Ou 
ACTNext by ACT, Inc. 
500 ACT Drive 
lowa City, IA 52243 


lu.ou@act.org 


Abe D. Hofman 
University of Amsterdam 
Nieuwe Achtergracht 129-B 
1018 WS Amsterdam 
The Netherlands 


Vanessa R. Simmering 
ACTNext by ACT, Inc. 
500 ACT Drive 
lowa City, IA 52243 


vanessa.simmering@act.org 


a.d.hofman@uva.nl 


Timo Bechger 
ACTNext by ACT, Inc. 
500 ACT Drive 
lowa City, IA 52243 


timo.bechger@act.org 


ABSTRACT 


In this study, we fitted a mixed-effects nonlinear continuous- 
time mutualism model of skill development proposed by van 
der Maas et al. (2006) to naturally collected irregularly 
spaced time series data from an online adaptive practice sys- 
tem for mathematics called Math Garden. Results showed 
that the mutualism model provided a better fit to the data 
than a g-factor model. The paper illustrates continuous-time 
modeling of irregularly-spaced multivariate time series data 
that are increasingly prevalent in modern learning systems. 
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1. INTRODUCTION 


For the past century, generations of researchers have contin- 
ued to pursue explanations for the consistent positive corre- 
lations between diverse sets of cognitive ability tests, known 
as the positive manifold [25, 29]. Heated debates went on 
about whether there is a potential biologically based g-factor 
that causes the development of general intelligence as well 
as the positive manifold [26, 7, 27, 11, 9]. Although re- 
searchers have not reached consensus, there is a shift from 
conceptualizing cognitive development as merely reflective, 
as in factor analysis, to thinking of it as formative [2, 14, 
27]. In a formative model, the positive manifold is an emer- 
gent property that results from within-person changes and 
connections over time. This ontological stance implies that 
research needs to focus on understanding the causal relation- 
ships that underlie cognitive development to guide effective 
efforts to predict and intervene in students’ learning. 
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Various mathematical representations encompassing contin- 
uous and discrete variables have been proposed to describe 
the mechanistic changes and sources of individual differences 
in cognition [28, 9, 32, 21]. From a developmental perspec- 
tive, cognitive abilities develop as a dynamic system with 
reciprocal interactions between the elements of the system 
causing the developmental pathways of each of the elements 
[29]. In the Mutualism model of intelligence, elements of a 
system interact with each other in a collaborative way to 
achieve mutual benefits. This provides an alternative ex- 
planation for the positive manifold, other than the g-factor 
approach, and only requires sparse, weak, and even some 
negative interactions to produce positive correlations [28]. 


In the current study, we take advantage of massive time se- 
ries data collected with an online learning environment for 
mathematics [12] and propose a method to fit the mutualism 
model to this dataset. We aim to examine potential recip- 
rocal interactions of mathematical skills in two domains — 
counting and addition — in children’s learning and practic- 
ing mathematics online. We build a model that takes into 
account individual differences in the learning processes by 
allowing individuals to start in different positions and by 
including random effects in key parameters of an otherwise 
group-based mutualism model. Note that this is the first 
application of the nonlinear mutualism differential equation 
model to empirical data, providing an evaluation of how 
well the theoretical account proposed by van der Maas et 
al. [29] can capture changes in children’s mathematical skill 
development over time. In addition, we pioneer the use of 
continuous-time models to analyze irregularly-spaced data 
that arise when students use educational technology in real- 
istic settings, and show that the estimation framework im- 
plemented in the dynr R package [19, 20] can handle nonlin- 
ear equations and mixed effects that explain both between- 
person and within-person differences. 


In summary, the contribution of the work is three-fold: 1) 
providing new evidence of reciprocal interactions in mathe- 
matics skill development as a pioneer in fitting the nonlinear 
mutualism model to empirical data; 2) presenting a way to 
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analyze the irregularly spaced multivariate time series data 
commonly seen in learning systems; and 3) demonstrating 
the use of state-space approaches in estimating parameters 
of mixed-effects dynamic models. 


In the following sections, we first explain the mathematical 
model we use to characterize the learning processes, and the 
estimation procedure. We then present how the empirical 
data in the current study were collected, and the sample 
characteristics. The paper ends with a discussion of the re- 
sults and their implications for education. 


2. THE MUTUALISM MODEL 


In biology, the term Mutualism is used for a relation be- 
tween species populations where different species organically 
interact with each other to maintain sustainable growth [1]. 
Biologists routinely use the Lotka-Volterra model [16, 30] 
to study the dynamics of such relations, which inspired van 
der Maas and colleagues [29] to propose the same model, re- 
ferred to as the mutualism model, to study the dynamics of 
cognitive development, where elements of a cognitive system 
interact with each other to achieve mutual benefit. 


2.1 The Lotka-Volterra Model 


Mathematically, the mutualism model can be expressed us- 
ing generalized N-subject Lotka-Volterra equations as 


da:(t) = F(x1(t), x(t), --- ,an(t))dt (1) 
a(t) + » aij 5(t) 

= | pia.(t) | 1 baat) dt, 2 

pivs(t) e (2) 

where 7,7 = 1,2,--- , N indicates different elements of a dy- 


namic system, and t is continuous time. Here, the elements 
are the counting and addition skills. The differential of vec- 
tor x(t) with respect to t denotes the change in x(t) within 
an infinitely small time interval. 


The model assumes logistic growth. The p; are growth pa- 
rameters that determine the steepness of the logistic growth 
function associated with each x;(t), and the K; are the carry- 
ing capacity parameters that represent the limited resources 
in the system, such as limited attention and working memory 
one can allocate in learning. The a;; are interaction parame- 
ters that specify the relations between each pair of x; and x; 
in development. With all a;; = 0, the change of the latent 
variable x;(t) follows a simple logistic curve that converges 
to an equilibrium state of K;, regardless of its starting posi- 
tion. The system is collaborative if the Jacobian matrix oe 
is positive definite, and is competitive otherwise. If, for all 7, 
xi(t) and p; only take positive values, then it is possible to 
show that as long as the combined consumption of resources 
xi(t)+ >> aj; does not exceed the carrying capacity K;, x;(t) 
tj 
will continue to increase to its equilibrium. Further, when 
the interaction parameters ai; are negative (or —a;; are pos- 
itive) for all 7 4 2, 2;(t) can develop even beyond the original 
carrying capacity K;, as a benefit of the collaboration with 
the other processes. On the other hand, when the param- 
eters aij,j 4 1 are positive, x;(t) can never reach the full 
potential K;, as a loss due to competition. van der Maas 
and colleagues [29] showed that when —a;,; is positive and 


less than 1, the mutualism model can result in the positive 
manifold. 


2.2 State-space Representation 

If we take into account individual differences in the mutual- 
ism model, as well as process noise and measurement errors! 
that may occur alongside the manifestation of the mutual- 
ism process, we obtain a state-space representation of the 
mutualism model: 


dx;(t) = F's(as(t))dt + dws(t) 3) 
©1,3(t)+a12%2, s(t) 
P1L1,s t i, pes EL abe elie ee FANS 
POS) aheeean|| 
p2ta,s(t) (1 — “Sete 
y,(ts,n) — @5(ts,n) + €s(ts,n), 5) 
2 
= el 0 
€s(ts.n) ~ N (0.2. = 0 ea) ‘ 6) 
where the subscript s indexes individuals, and k = 1,2,--- , 7; 


indexes the kth discrete person-specific measurement occa- 
sions ts,z. The vector a,(t) contains the latent counting 
and addition skills 21,;(¢) and x2,;(t) for an individual s, 
manifested as y,(ts,x) in a measurement model with serially 
independent Gaussian measurement errors €s(t;,~). The dif 
ferential of a;(t) is determined by the systematic dynamic 
functions F',(-) and the differential of process noise w,(t) 
that follows a Wiener process (i.e., a continuous-time ver- 
sion of random walk, [10]), with a diffusion matrix Q = 
O44 0 
0 o. m2 
to the carrying capacity parameters, and are assumed to fol- 
low a normal distribution with mean O and a covariance 
oO boll 0b.12 : 
9p12 %,22 


. Person-specific random effects hea are added 


2,8 


matrix of y= 


The initial condition, or the distribution of the variables at 
the first available time point, of the dynamic process a5 (ts,1) 
is assumed to follow a multivariate normal distribution with 


1,1 : oO oO 
M11) and variance ah p12) 
12 01,12 91,22 


mean 


2.3. An Alternative G-factor Model 


In order to explore the fit of the mutualism model to empiri- 
cal data compared to the g-theory, a comparable state-space 


‘Process noise is distinct from measurement error in that the 
former is associated with random behavior in the underly- 
ing process, whereas the latter depends on the measurement 
process, device, and other environmental influences that may 
affect the accuracy of the measurements. In an educational 
context, a correct guess without knowing an item can be seen 
as a measurement error, while a child having a good or bad 
day could contribute to the process noise. Whereas mea- 
surement error does not influence growth at the next time 
point, the process noise does steer the dynamical system. 
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one-factor model without interactions can be developed as 


i= pe (1 - ei) ee (7) 
Ls(ts,1) ia N(ta,1, 01,11); Q Lana , dp = [oF al 
taltes) = |] ta(ten) + elton) (8) 


2 
_ O61 0 
eta) (0.8.= |" 2,]): 


where the observed variables are linearly linked with the 
single latent variable through a loading matrix of [1 A] ua 


3. ESTIMATION 


To estimate the random effects in the models, we augmented 
the latent variables x;(t) with random effects bs to yield a 


new latent variable vector, x3(t) = [as(t) le We then 
modified the differential equations, the measurement model, 
and the initial condition to incorporate this change of #;(t). 


We used the dynr R package [19, 20] to estimate the param- 
eters in the mixed-effects mutualism model, as well as the 
baseline g-factor model, by numerically optimizing an ap- 
proximate log-likelihood function obtained as a by-product 
of the continuous-discrete extended Kalman filter [15]. Akaike 
Information Criterion (AIC) and Bayesian Information Cri- 
terion (BIC) were constructed to compare models. Details 
of the estimation algorithms can be found in [4, 20]. 


Addition 


Counting 


Figure 1: Screen shots of the counting and addition 
games in the Math Garden. Children give responses 
by clicking an option. The coins at the bottom dis- 
appear one per second, and reflect the scoring rule 
based on accuracy and response time. 


4. EMPIRICAL STUDY 


Here, we describe an application of the mutualism model. 


4.1 Math Garden 


We sampled data using a popular Dutch online adaptive 
practice and monitoring system called Math Garden [13]. 
The system consists of games that measure different math- 
ematical skills, including counting and addition, as players 
practice their arithmetic skills through answering items. Fig- 
ure 1 shows screen shots of two example items. 


The system applies an explicit scoring rule for both speed 
and accuracy [17], visible to players as the number of coins 
they collect. For each item, a limit number of coins can be 
collected, and the number decreases by one at each addi- 
tional second used to come up with the answer. In case of 


a correct answer, the score equals the remaining time. If 
the answer is incorrect, the score is the negative remaining 
time. The scoring rule takes speed-accuracy trade-off into 
account, penalizes quick but incorrect answers, and encour- 
ages thoughtful responses. 


Skill rating and item difficulty are estimated on-the-fly us- 
ing the Elo-algorithm [6] which was originally developed for 
chess competitions between two players, and now has been 
adapted for pairing a player with an item [13]. The skill 
and difficulty estimates for a player and an item are up- 
dated at each “match” they are involved in, depending on 
the weighted difference between observed and expected cor- 
rectness, the latter of which is entailed by the measurement 
model [17]. Evidence has shown high validity and reliability 
of the skill and difficulty estimates [13]. 


In the current study, our observed data are the continuous 
end-of-day skill ratings in different domains, rather than bi- 
nary correctness for each item. Comparisons of Math Gar- 
den’s underlying measurement model [17] and the adapted 
Elo-algorithm [13] to other common models for binary re- 
sponses in educational data mining — the Rasch model [23], 
additive factor models [3], performance factor analysis [22], 
and Bayesian knowledge tracing models [5] — are worth ex- 
ploration, but beyond the scope of this paper. 


4.2 Data Description 

We selected a sample of children in grades 3-6, between the 
ages of 6 and 10 years old, who practiced counting and ad- 
dition skills during at least 4 different months in the school 
year from September, 2016 to July, 2017, and had played at 
least 20 different days in each domain, with a minimum of 10 
items per day. We excluded children whose parents indicated 
unwillingness to participate in Math Garden-related scien- 
tific research that was approved by the Ethics Committee of 
the psychology department of University of Amsterdam. 


The resulting sample included a total of 2485 children, 51.07% 
male. The average age at which a child started to use Math 
Garden for practicing counting and addition during the school 
year was 7.23 years old (SD = 1.03). The original skill rat- 
ings could be negative, so we shifted them to the positive 
range by respectively adding 20 and 25 to the counting and 
addition scales. The over-time ebb-and-flow and variability 
of the skill ratings remain the same. From the second-by- 
second time stamps of the data points, we constructed con- 
tinuous measures of time where each unit represents a week. 
Figure 2 shows the shifted skill ratings for three randomly 
selected individuals over time. 


Distributions of the initial and ending skill ratings are plot- 
ted in Figure 3. At the first available time point for each 
individual, the counting skill ratings for all sampled children 
had a mean of 13.84 and a variance of 1.51, whereas the ad- 
dition skill ratings had mean 12.94 and a larger variance of 
10.56. At the last available time point for each individual, 
the ending counting estimates had a mean of 14.83 and vari- 
ance of 1.49, while the ending addition estimates had a mean 
of 15.92 and variance of 9.05. Generally speaking, during the 
school year, more development is observed in the addition 
skill compared to the counting skill. There was more vari- 
ability in children’s initial and ending addition skill ratings 
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Figure 2: Individual time series data of counting and 
addition skill ratings for three children. 


than in the counting domain. 
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Figure 3: Histograms of the initial and ending skill 
ratings 


The correlation between the initial addition and counting 
skill ratings was .73, and the correlation between the ending 
ratings was .79, confirming the positive manifold. Figure 4 
shows the boxplot of within-person correlations between the 
skill ratings in the two domains. The mean of the within- 
person correlations was .55 (SD = 0.38). However, some neg- 
ative values were observed at the significance level of 0.05 of 
the asymptotic p-values computed by the Hmisc R package 
[8]. For example, in Figure 2 the child with identification 
number 1344 had upward growth in the addition skill rat- 
ings and downward decline in the counting skill ratings. The 
downward decline may be due to an unexpected bump in 
the skill ratings that was higher than the child’s equilibrium 
and hence resulted in a return to the equilibrium. Another 
possible explanation would be that, the child learned and 
practiced counting at school before addition, but forgetting 
took place as the child started to learn addition and prac- 
ticed counting less. In such a case, there was a competition 
for attention and learning time between the skills in differ- 
ent domains, instead of a collaboration. Either case can be 
captured by the mutualism model. 


The length of the individual time series of the counting skill 
ratings ranged from 20 to 177 days with a median of 29 
days, and that of the addition skill ratings ranged from 20 
to 192 days with a median of 42 days. The length of the 
interval between two time points represents the inactivity 
gap between two practice days of an individual for a math- 
ematical skill, and ranged from 1 day to 18.29 weeks. The 
minimal gap length of a single time series had a median of 1 
day across the sample in both domains, whereas the median 
maximum gap length was 4.86 weeks in the counting domain 
and 4 weeks in the addition domain. In Figure 5, the data of 


Within-person correlations between the ability ratings 
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Figure 4: A boxplot of the within-person correla- 
tions of the addition and counting skill ratings. 


ten randomly selected individuals illustrate the irregularly- 
spaced measurement occasions, as well as the unbalanced 
practices in each domain on a single day and across time. 
The mutualism model assumes a continuous integrative pro- 
cess of change even though we do not have measurements of 
each skill at all times. 
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Figure 5: An illustration of the irregularly spaced 
time intervals of ten randomly selected individuals. 
Different colors represent the domains that an indi- 
vidual practiced during a specific day. 


4.3 Empirical Results 
The parameter estimates and model fit indices of both the 
mutualism model and the g-factor model were summarized 
in Table 1. All parameters were estimated to be significantly 
different from zero (p < .05). The estimates of the initial 
condition parameters (/11,1, [1,2, oF 11; 712; and 07,22) in 
both models were consistent with the sample mean and vari- 
ance of the initial states. With lower AIC and BIC values, 
the mutualism model provided a better fit to the data com- 
pared to the g-factor model. Figure 6 shows the fit of the 
mutualism model to the observed data of four randomly se- 
lected individuals. The fitted trajectories were able to cap- 
ture the changes of the observed paths for the individuals 
in both domains, suggesting a decent fit of the model to the 
data. In the mutualism model, the steepness parameters 1 
and p2 were estimated to be close to zero, indicating that 
the overall development in skills was small and slow. The 
group-level equilibrium states K, and Ke2, for when there 
was no interaction between the processes, were estimated to 
about 10, but individual differences captured by the random 
effects b; and bz contribute to an estimated co-variance of 
1.04 0.09 
0.09 1.06]° 
and a21 were found to be significantly negative, so the inter- 


Estimates of the interaction parameters a12 
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Table 1: Parameter estimates (standard errors) and 


model fit indice 
Mutualism Model | g-factor Model 
P1 0.08 (0.002) 0.02 (0.001) 


| po 0.09 (0.001) 
Gis -0.48 (0.005) 
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( 
a1 -0.58 (0.004 
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Figure 6: Observed and fitted skill ratings from the 
mutualism model. 


actions between counting and addition ratings had a positive 
effect on their level changes. These results indicated that 
counting and addition skills collaborate, instead of compet- 
ing, to form a positive manifold in the long run. 


In summary, we have found beneficial interactions between 
children’s addition and counting skill ratings as being better 
at one skill helps being better at the other. The mutualism 
model was a better fit to the data than the g-factor model. 
Individual differences are present in the data in both starting 
positions of the change trajectories and key model parame- 
ters that represent limited resources in the system, providing 
potential evidence for both the g-theory and the mutualism 
model of general intelligence, according to [29]. We concur 
with van de Maas and colleagues (2006) that individual dif- 


ferences cannot be ignored in educational applications. 


5. CONCLUSIONS 


In this paper, we presented a state-space expression of the 
continuous-time mutualism model proposed by [29] where 
individual differences, process noise, and measurement er- 
rors were taken into account. The mutualism model allowed 
us to tackle the underlying mechanism of the skill devel- 
opment from a micro perspective. We fitted the theoretical 
model to empirical data naturally collected online in authen- 
tic educational settings. Results showed that improvement 
in addition skill could positively influence the development 
in the counting domain, and vice versa. The better fit of 
the mutualism model to the data compared to the g-factor 
model suggested that the collaboration between the count- 
ing and addition skills in their co-development served as a 
better interpretation of the observed positive manifold. 


The characteristics of the time series data in the current 
study are not uncommon in education as digital technology 
has transformed our way of collecting data about learning. 
The paper illustrates one way to fit dynamic models to the 
multivariate noisy irregularly spaced data that are rich in 
our real life. We appreciate the potential to apply the cur- 
rent method to different learning data to improve our un- 
derstanding of cognitive and non-cognitive developments. 


Nevertheless, this work has limitations that future work should 
aim to overcome. First, only two variables were considered 
in the current sample, while the mutualism model could be 
extended to multiple dimensions. The estimation algorithm 
is well suited for multivariate time series data, but the in- 
terpretation of the multivariate model can become compli- 
cated. Second, the estimation framework permits only a lim- 
ited number of random effects in the current study [18]. In 
addition to the two carrying capacity parameters, one may 
be interested in adding random effects in the interaction pa- 
rameters because of the potential competition between skills 
under time and attention constraints as we discussed above. 
The limitation of the estimation framework may be circum- 
vented by utilizing sampling-based algorithms although they 
may be computationally heavy. 


The fitting of the model to the data does not exclude other 
probable ways of interpreting cognitive development. In- 
tervention studies with deliberate experimental designs are 
needed to establish causal relations in a dynamic system. 
These interventions may take the form of randomized as- 
signment of skills to practice, for example, with groups of 
students assigned to practice only counting or only addition, 
but with progress measured on both skills after some period 
of practice. The cross-skill influence of practice can then be 
evaluated relative to practiced skill improvement. 


Future work should also aim to evaluate how the mutual- 
ism account of skill development relates to other findings in 
education. For example, evidence suggests that interleaving 
practice on different problem types produces more robust 
learning and generalization than does blocking practice by 
problem type [31, 24]. It is possible that some of the benefit 
from interleaving relates to mutualism, with practice from 
different problem types influencing the development of the 
other skills. 
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